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Abstract. We link nonlinear manifold learning techniques for data analysis/compression with 
model reduction techniques for evolution equations with time scale separation. In particular, we 
demonstrate a "nonlinear extension" of the POD-Galerkin approach to obtaining reduced dynamic 
models of dissipative evolution equations. The approach is illustrated through a reaction-diffusion 
PDE, and the performance of different simulators on the full and the reduced models is compared. 
We also discuss the relation of this nonlinear extension with the so-called nonlinear Galerkin methods 
developed in the context of Approximate Inertial Manifolds. 
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1. Introduction. The purpose of this paper is to extend established model re- 
duction methods for large-scale dynamical systems characterized by separation of time 
scales by linking them to recently developed (nonlinear) manifold learning techniques- 
in particular, with the diffusion map (DMAP) approach of Coifman and coworkers 
[SJ [B] ; see also pQ . 

The general setting involves a high-dimensional, stiff system of differential equa- 
tions of the form 

l-m (i.D 

with / : R d — > R d . We will focus on problems where this system arises in the 
context of discretizing a dissipative partial differential equation: an equation of the 
form 

^-u + Au = F(u), (1.2) 

where A is a positive self-adjoint operator with a discrete spectrum and F(u) is a "well- 
behaved" function of u El HH 023 [HI [46] . We chose this type of example because 
we will later draw some analogies between our approach and the so-called nonlinear 
Galerkin reduction techniques [16l [TH [26] [TBI HS] developed in precisely this context 
of Approximate Inertial Manifolds. We note, however, that the reduction procedure 
we will discuss can also be attempted for large systems of ODEs characterized by 
separation of time scales (and the associated stiffness) that arise in other situations- 
for example, in large, complicated chemical kinetic networks. 
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For our prototypical system (equation there exists a slow, attracting, in- 

variant manifold to which trajectories quickly decay. In the framework of Approximate 
Inertial Manifolds, this means that when solutions to equation (11.21) are projected onto 
the complete set of eigenfunctions of A, the long-term behavior of the solution com- 
ponents in the higher eigenfunctions can be parameterized by (depicted as a graph 
of a function over) the solution components in the leading eigenfunctions. The result 
is a low-dimensional manifold in an infinite-dimensional space (or, for truncated ap- 
proximations, in a high-dimensional space). Given a collection of points sampled from 
such a manifold, we wish to develop a reduced set of dynamic equations that describe 
the dynamics on this manifold. Ideally, the dimensionality of this reduced set would 
be the "intrinsic" dimension of the long-term dynamics: the dimension of the mani- 
fold. Nonlinear Galerkin techniques share this goal, but they are not based on data 
sampling on the manifold; they are based instead on approximating it as a graph of a 
function above the first few eigenfunctions of the operator (or eigenvectors in the case 
of its discretization). What we propose here is an extension of what is referred to as 
"the POD-Galerkin approach" -sampling data on the manifold, using Principal Com- 
ponent Analysis (PCA) to compress the data (see, e.g. [24]), and performing (various 
forms of) Galerkin projection of the dynamics on the leading POD modes to obtain 
reduced models of the long-term dynamics themselves (see, e.g., [231 HH 1301 131 133] )• 
Since principal component analysis passes optimal (in a well-defined sense) approx- 
imating hyperplanes through the data points, one can easily envision intrinsically 
low-dimensional (but curved) manifolds that require much higher dimensional hyper- 
planes to successfully embed them. Remedying this potential discrepancy between 
intrinsic manifold dimension and the lowest-dimensional POD-Galerkin that can suc- 
cessfully reproduce the long-term dynamics is the focus of this work. 

Modern manifold learning techniques can be thought of as (nonlinear) extensions 
of PCA, in the sense that they "learn" the geometry of the low-dimensional mani- 
folds on which the data lie, and pass optimal (again in a well-defined sense) nonlinear 
approximating manifolds through the data points. Our procedure utilizes diffusion 
maps (DMAPs) to learn the geometry of the slow manifold from simulation data, 
and the related Nystrom extension to rewrite the dynamics of exploiting this 
geometry. Some interpolation (in the form of the Nystrom extension) is required, but 
it is performed in the reduced-dimension embedding space (as opposed to the original, 
high-dimensional, data space). Model dimensionality as well as model equation stiff- 
ness is thus hopefully reduced, so that both explicit and implicit temporal integration 
methods may exhibit savings over the original, full model simulation. The premise is 
that the "basis functions" provided by the DMAP process can be (sometimes signifi- 
cantly) fewer in number than the basis functions provided by methods such as POD, 
since they represent the true underlying dimensionality of the slow manifold. 

Reducing the model with the help of these "nonlinear" modes can help identify 
significant components of the dynamics, enhance intuition about the dynamical sys- 
tem behavior, and faciliate computations. We will see, however, that the nonlinearity 
of the reduction technique also gives rise to certain difficulties, which may offset the 
benefit of the "more parsimonious" manifold parametrization. Other approaches to 
model reduction can be analytical, such as "quasi-steady-state" /partial-equilibrium 
assumptions [39l [41] , or computer-assisted, such as the rate-controlled constrained 
equilibrium method (RCCE) [27], computational singular perturbation (CSP) [32] . 
intrinsic low-dimensional manifolds (ILDM) [31] (essentially, first-order CSP), func- 
tional iteration [30] , and even conditions on various derivatives [HI HOI HOI 1201 1331 HZ] • 
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The paper is organized as follows. We start with a concise description of DMAPs 
for manifold learning using high-dimensional data. We then briefly review the POD- 
Galerkin approach, and present our method as its natural nonlinear analogue. We 
then compare our method with nonlinear Galerkin reduction techniques through an 
illustrative example, and conclude with a brief summary and discussion of open prob- 
lems and possible extensions. 

2. Brief introduction to DMAPs. Diffusion maps have recently emerged as 
a fast, robust, nonlinear dimensionality reduction tool [5J[3]; see also PQ. Starting 
with an ensemble of data (e.g. points in a high-dimensional space), diffusion maps 
(DMAPs) help determine whether they can be embedded in a lower-dimensional space, 
and also find and parametrize a "best" (and possibly nonlinear) low-dimensional mani- 
fold that (approximately) contains the data. In actuality, the only input to the DMAP 
algorithm is a scalar similarity measure between each pair of entries in the data en- 
semble; therefore, DMAPs can also be used on data that are not necessarily points 
in Euclidean space [TTJ [T31 ES] . In our context, however, the data ensemble does con- 
sist of points (represented as vectors) sampled from ODE trajectories; we therefore 
present the DMAP algorithm for data ensembles that consist of vectors Xi <E R d . 
Typically, the similarity measure becomes negligible beyond a local neighborhood of 
each data point; this is related to the notion that large Euclidean distances may not 
reliably approximate geodesic distances on a general nonlinear manifold. These pair- 
wise similarities are used in the DMAP algorithm to construct a matrix whose leading 
eigenvectors nonlinearly embed the data set in a lower-dimensional Euclidean space. 
Euclidean distance in the new, reduced space approximates the diffusion distance, a 
quantity related to manifold geodesic distance (for a rigorous definition of the diffu- 
sion distance see [7])- The DMAP can thus provide a global manifold parametrization 
based on local information only, much like the unfolding of a crumpled towel to a two- 
dimensional rectangle. 

To construct a low-dimensional embedding for a data set of M individual points 
(represented as d-dimensional real vectors, X\,...,Xm) we start with a similarity mea- 
sure between each pair of vectors Xi, Xj. The similarity measure is a nonnegative 
quantity W(i,j) = W(j, i) satisfying certain additional "admissibility conditions" [5]. 
As a concrete example, consider the Gaussian and Epanechnikov similarity measures 
(based on the standard L 2 norm): 



W(i,j) 



exp 



Xi — Xi 



(2.1) 



and 



W(i,j) = (l- 



Xi — A, 



/e 2 )l 



{ \\Xi-Xj <e}- 



(2.2) 



A weighted Euclidean norm may be chosen over the standard L 2 norm in situations 
where the values of different components of X may vary over disparate orders of 
magnitude, e defines a characteristic scale which quantifies the "locality" of the 
neighborhood within which Euclidean distance can be used as the basis of a meaningful 
similarity measure 5 . A systematic approach to determining appropriate e values is 
discussed below. 
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Next, we define the diagonal matrix D by 

M 

D(i,i) = ^2w{i,k), 
fe=i 

and then we compute the first few right eigenvectors corresponding to the largest 
eigenvalues of the stochastic matrix 

K = D _1 W. 

In MATLAB, for instance, this can be done with the command [V, L] = eigs(K,n), 
where n is the number of top eigenvalues we wish to keep (we typically are only 
interested in the first few). 

This gives a set of real eigenvalues Ao > Ai > ... > A n > ... > with corresponding 
eigenvectors {V'jIjLo- Since K is stochastic, Ao = 1 and i/'o = [1 1 ... 1] T . The n- 
dimensional representation of a particular cZ-dimensional data vector, A^, is given by 
the diffusion map : R d — > R™, where 

$ n (x i ) = [\{$\\i$\...,\^}, 

a mapping which is only defined on the M recorded data vectors. Here, t represents 
the "diffusion time"; to keep things simple, we choose t = X. In other words, the 
vector Xi is mapped to a vector whose first component is the ith component of the 
first nontrivial eigenvector, whose second component is the zth component of the 
second nontrivial eigenvector, etc. If a gap in the eigenvalue spectrum is observed 
between eigenvalues A n and A„+i, then \l/„ may provide a useful low-dimensional 
representation of the data set [TJ |3B]. In fact, when this gap is present (and when 
the DMAP is scaled as \t n (AQ) = [Xiif[ l \ A2 - 02*\ — > A„^n' ) ]), Euclidean distance in 
the DMAP space of only these first n eigenvectors will accurately approximate the 
diffusion distance mentioned abov^E 

We choose the value of e used in the DMAP computation by invoking the correla- 
tion dimension [22j . The assumption here is that the volume of an n-dimensional set 
scales with any characteristic length sass"; for relatively uniform sampling one might 
expect the number of neighbors less than s apart to scale similarly. We demonstrate 
this technique on a simple illustrative data set (points sampled from a one-dimensional 
curve, see Figure [2T] below) by first computing all pairwise Euclidean distances. Fig- 
ure shows L(e), the total number of pairwise distances less than e; it is clear that 
an asymptote will arise at large e (M 2 , where M is the number of points) and at 
small e (M). In the figure, the two asymptotes are smoothly connected by an ap- 
proximately straight line; the slope of this line suggests the correct dimensionality 
for our data set (here, one). The range of e values corresponding to this straight 

1 lt should be noted that sometimes, t/>; and i/jj (for some i < j < n) contain redundant infor- 
mation. Consider, for instance, a two-dimensional rectangular sheet (perhaps in a high-dimensional 
ambient space) of long length and narrow width, for which we wish to obtain a two-dimensional 
paramctrization. We would therefore desire only two eigenvectors (eigenfunctions) of diffusion, one 
corresponding to the coordinate in the width direction and one to the coordinate in the length di- 
rection. However, many of the computed eigenvalues for diffusion in the "long" direction may be 
greater than the first eigenvalue whose eigenvector corresponds to diffusion in the "width" direction. 
We may then want to ignore, in our embedding, many of the eigenvectors corresponding to the long 
direction. Measures such as mutual information have been suggested to detect such "redundant" 
eigenvectors 142 1 . 
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Fig. 2.1. Left: log-log plot of L(e), a statistic of the data set (see text) vs. e. For M data vectors, 
lime^oo L(t) is simply M 2 and lim e _s.o L(e) is simply M. These two asymptotes are connected, 
however, by an approximately straight line, e values chosen from this regime are appropriate for our 
DMAP calculations. Right: three parametrizations of the same two-dimensional spiral embedded in 
three-dimensional space (again, see text). 



segment are all acceptable in our DMAP computations: here any value of e between 
approximately 10~ 2 and 10° may be used. Figure 12.11 also schematically illustrates 
the difference between PCA-based and DMAP-based reduction; the data are in the 
form of a two-dimensional spiral embedded in three-dimensional space. Using the 
original coordinates to parametrize this spiral requires three coordinates, while using 
PCA requires two and DMAPs require only one (since the actual dimensionality of 
the spiral is one). 

3. Ambient space to DMAP space and back. In order to utilize the model 
reduction machinery provided by diffusion maps, one must be able to map back and 
forth between the original, high-dimensional ambient space and the reduced, low- 
dimensional DMAP space. From the original space to DMAP space, there exists 
a mathematically elegant approach known as Nystrom extension; the reverse map, 
however, is more difficult. 

3.1. Nystrom extension. The problem of finding the DMAP coordinates of 
a new <i-dimensional vector Anew (& point not contained in the original data set) 
is solved with the Nystrom extension. The first step is to compute all distances 
{dinevj}f£i (or at least those which do not give a negligible similarity measure) between 
our new vector and the M vectors in our data set, and set, for instance W(i, new) = 

W{new,i) = exp - (i^™) 2 r TU(i,new) = W(new 7 i) = (1 - d 2 ncw /e 2 )l{ dincw < e } 



G 
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(corresponding to equations (|2.1|) and (|2.2I) . respectively). Setting 

( M \ 1 

K(i, new) = Y"' W(k, new) W(i,new), 

the jth DMAP coordinate (there are n coordinates total) of the new vector Anew is 
given by 

1 M 

ipj(Xnew) = T~^2 K (h new) . 

' ■ t 
J %—\ 

We typically drop the n from ^ n (again, n denotes the number of DMAP coordinates 
we choose to retain, and thus, the dimension of our DMAP space), and denote this 
process of Nystrom extension as ^f(X). Clearly, this extension procedure will return 
a result even if Anew is not chosen to be exactly on our low-dimensional manifold 

3.2. Construction of an "inverse Nystrom map". An important compo- 
nent of our reduced computations below is the ability to transform from DMAP 
space back to ambient space. In other words, for given values of low-dimensional 
DMAP coordinates, we wish to find the corresponding "on manifold" point in the 
high-dimensional Euclidean space where the original data ensemble lies. Approaches 
proposed include simulated annealing |13j . Newton iteration, polynomial interpola- 
tion, interpolation using radial basis functions [38] . manifold regularization [2], and 
geometric harmonics [4] . The method we chose here is polynomial interpolation of the 
ambient coordinates over DMAP space; each coordinate of ambient space is written 
as a polynomial over the low-dimensional DMAP space. Polynomial interpolation is 
used simply because it is easy to derive order of magnitude error estimates; yet one 
should mention (even though this was not observed in our computations) that singular 
matrices can, in principle, arise in the process (see, e.g. the discussion in [38 ) The 
geometric harmonics extension scheme, akin to Fourier interpolation on manifolds, 
may very well outperform polynomial interpolation and is described below. 

Since this transformation process is analogous to an approximate inverse of the 
diffusion map (we go from low-dimensional DMAP space to high-dimensional ambient 
space), we denote it as x P~ 1 (i), or simply $~ 1 (i), for L G R™ (DMAP space). 

3.2.1. Polynomial interpolation. Suppose we wish to find A G R d on the 
manifold such that for a given L G R", 

L = ^n(X), or, as we indicated above, simply L — ^(X). 

We first determine the K closest points to L in our data set (with proximity mea- 
sured in DMAP space) and note both the DMAP coordinates {Li}fL 1 and corre- 
sponding ambient space coordinates {Xi}fL 1 G R d of these K points. These points 
are used to compute the coefficients of a local polynomial interpolation of each am- 
bient space coordinate over the DMAP coordinates; for each coordinate j = 1, 2, d 
of ambient space, we determine the coefficients for the polynomial interpolation us- 
ing {Li, x[ j ^}, {L 2 , X2 }, {Lk, A^}, and we denote the result as setting 
j[U) = p(i)(L). This procedure is relatively fast because the polynomials are con- 
structed over R™, and n has been assumed small. For example, in the reaction- 
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diffusion example that follows, we choose K = 10 and interpolate over the two- 
dimensional DMAP space using six overall (up to quadratic) terms. The six coeffi- 
cients are fitted through least squares since, for K = 10, the system is overdetermined 
(only K = 6 is required). The result of this procedure is a point ^ _1 (L) which is an 
approximation of the corresponding "true," on-manifold point. The error introduced 
by this approximation will affect our reduced dynamical model computations, as we 
will mention below. 

3.2.2. Geometric harmonics. An additional computational tool from har- 
monic analysis which can be used for interpolation and extrapolation is geometric 
harmonics [H [31] . We do not use this tool here, since it is difficult for us to derive 
order-of-error bounds; we do, however, feel that it is an important option, and -while 
we omit many significant mathematical details- we include here a brief description, for 
completeness. The geometric harmonics extension scheme, inspired by the Nystrom 
extension scheme, is a method for extending functions defined over some smaller set X 
to a larger one X [4] . This is exactly the problem of constructing the inverse Nystrom 
map; for each point in our data ensemble, we know both its DMAP coordinates and 
its ambient space coordinates (this can be viewed as d functions from DMAP space 
(R") to each coordinate in ambient space), and we now wish to know this function 
for new values of the DMAP coordinates as well. Throughout, we use / to denote the 
function we desire to extend from X to X. 

We first choose a symmetric, positive semi-definite kernel k : XxI->R such as 
k{x, y) — exp \\x — y|| 2 /e 2 ^j . Here, e quantifies the distance away from the data set 

X that we wish to be able to extend our function f(X). The key observation is that 
when / oscillates with frequency v, it cannot be extended reliably beyond a distance 
of 0(\/v). This makes intuitive sense, as interpolation of functions / which change 
rapidly should not be trusted far from the "known" values of /. For demonstration, 
a geometric harmonics extension of the function f(9) = cos(2#) is shown in Figure 
13.11 where 9 — arctan- and the set X is simply the circle in R 2 given by x 2 + y 2 = 1 . 
For e = 0.5, the extension of / onto the plane is "wider" (see Figure [3~Tj) than for the 
case e = 0.1, but as we will see, this comes with the trade-off of increased extension 
error (here, the error is measured on the "known" set X, see below). Here, it is 
worth noting that, due to the nature of the kernel k used, this formulation bears a 
strong resemblance to function approximation procedures using radial basis functions 
(RBFs) (see, e.g. QI]). 

We can restrict this kernel to X and define the linear operator K as 

K f(x) = / k(x,x)f(x)dfi(x), 
Jx 

where fj.(x) is just the measure of the set X (if the data set consists of M random sam- 
ples, n{x) = 1/M is typically used). It can be shown that this operator has a discrete 
set of eigenvalues {Xj} > (in nonincreasing order) and orthonormal eigenfunctions 
{Tj} so that for almost all x G A, 

Jx 

Over finite sets, these integrals are obviously just sums and the eigenfunctions are 
just eigenvectors. The geometric harmonics are defined as the extension of the eigen- 
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Original function (cos 26) GH extension, e = .5 GH extension, e = .1 




Fig. 3.1. The original function of f = cos(26), defined on the unit circle (left), is extended 
with e = 0.5 (middle) and e = 0.1 (right). 

functions to X via 

for Xj 0. Since Xj — > as j ' — > oo, this extension procedure becomes numerically ill- 
conditioned. We pick a condition numbeJl 1/d, and let Ss — {j such that Xj > (5 An}. 
We can now extend f(X) to f(X) as follows: 

1. Project / onto the numerically acceptable eigenfunctions via 

f(x) (f,Tj)xTj(x), where () denotes inner product. (3-1) 

2. Use Tj to extend / on X as 

f(x)=J2(fiT j )xT j (x). 

jes s 

With a fine kernel (small e) there is negligible error in the projection of / in 
equation (|3.ip (here, the error is measured on the "known" set X); the eigenvalues 
decay slowly enough so that most of the eigenfunctions can be used (Xj > 5Xq for 
most j). However, Tj(x) quickly decays to zero for any x much farther away from the 
data set X than e because there, the kernel function k practically vanishes. With a 
coarser kernel (large e), there may be more error in the projection projection of / in 
equation (|3.1[) (again measured on the "known" set X) since most eigenfunctions are 
neglected. However, the domain of extendability is much larger. Because of this, a 
multiscale extension scheme is usually used in practice. In such a scheme, / is first 
projected at a coarse scale (large e), and then the error in this initial coarse projection 
(the residual /) is projected at a finer scale (smaller e). The error in this second, finer 
projection is then projected at an even finer scale, and this process is continued until 
the total error shrinks below some desired threshold. The sum of these projections 
yields the extended / function. Often, an initial e = en is chosen, and then during 
projection i, e = 2~( l_1 )en [I]. / is therefore extended as a linear combination of 
functions which oscillate with frequency Vj and which vanish at a distance 0(1/ Vj) 
from the set X. Several results about the mathematical optimality of the extension 
of / by this procedure have been obtained [4]. 



2 We note that, as discussed in [17] . ill-conditioning is not inherent to the problem, but rather, 
to the implementation. 
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4. Dynamics in reduced spaces. As we outlined in the introduction, we begin 
with the setting of a dissipative partial differential equation of the form 

d 

—u + Au = F(u), 

operating on a function u(x,t). The usual stipulation is for F to be globally Lipshitz 
continuous and at least C 1 , while A is a self-adjoint, compact linear operator; slightly 
different requirements are possible (see, e.g., [T|)J HH [2E1 US] ) ■ After a Galerkin 
projection of this equation using the eigenfunctions of A, one obtains a stiff system 
of ODEs which describes the evolution of the coefficients of the projection of the 
solution (projections onto these eigenfunctions of A). Although the spectrum of A 
is typically countably infinite, one often truncates the eigenfunction expansion with 
only d eigenfunctions, where d is chosen based on some physical intuition or specified 
error. Under appropriate conditions, one can also obtain a low-dimensional manifold 
of dimension n < d in phase space to which trajectories are quickly attracted, due to 
the dissipative nature of the PDE [H [H H3 HI HE] • 

To illustrate the application of model reduction to dissipative PDEs, we utilize 
the Chafee-Infante reaction-diffusion equation 

u t - vu xx + u 3 - u = 0, (4.1) 

with v — 0.16 and periodic boundary conditions u(0,t) — u(n,t) — 0. This equation 
is exactly of the form of equation (jl.2[) with Au = vu xx and F(u) — u — u 3 , and it is 
known to have a two-dimensional inertial manifold |25) . 

We first perform a Galerkin projection of this equation onto the first d = 10 
Fourier modes (sinx, sin 2a;, . . ., sin 10a;), resulting in an equation of the form of 
CO): 

where y € R 10 denotes the vector of coefficients of the Galerkin projection, u(x, t) = 
y~)j—i y^'it) sin ia;. In the literature, the zth Fourier mode is usually denoted as ai in- 
stead of j/w ; this will be the convention we adopt when referring to the Fourier modes 
of the reaction-diffusion equation. Next, we obtain a data set X\, X2, £ R 10 

sampled from the two-dimensional, slow, inertial manifold by randomly generating 
initial conditions and integrating these initial conditions for a brief (t = 1) amount 
of time (sufficient enough for a decay to the slow manifold). Finally, we use these 
vectors X%, X%, . . . , Xm in order to perform the model reduction outlined in £14.11 and 
321 below. 



4.1. Proper orthogonal decomposition. Discretely sampling simulation tra- 
jectories after the initial (brief) approach to the attracting manifold will give rise to 
an ensemble of data points that have been computed and saved as vectors in R d . We 
can attempt to compress these data by taking advantage of the low dimensionality of 
the manifold that they live on; this is traditionally accomplished through the use of 
Principal Component Analysis (PCA) on the data. If we find that most of the energy 
(defined in terms of vector inner products in R d ) of the data can be spanned by the 
first n eigenvectors of the PCA decomposition, this suggests that the data can be 
efficiently described as vectors in R™ (as the components of the data in the directions 
of the first n eigenvectors). PCA, in addition to simple data compression, can also 
form the basis of a systematic way to 
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1. reduce the dynamics of the original high-dimensional (here, d-dimensional) 
dynamical system to k dimensions 

2. rewrite the correponding dynamics in the new, reduced, low-dimensional 

space 

using again a Galerkin procedure. One starts with the data set X\, X2, Xm € 
R d sampled from the (long term dynamics on the) low-dimensional, slow, invariant, 
attracting manifold. These X may come from a variety of sources. Here, we have 
chosen to focus on X which represent the coefficients of the spectral representation 
of solutions of equation (|1.2p , but these X may also be the concentrations of various 
chemical species for the slow manifold of a chemical kinetics problem. 

POD finds (for each n = 1,2, d) the projection onto the nth dimensional linear 
subspace which maximizes the variance of the projected data. Let us denote the 
projection of a vector in I £ R d onto this linear subspace as P n (X). Defining 

^ = if HkLi X[ 1 p n(X) can be written as P„(X) = c + J2k=i( x ~c,Pk)Pk- Here, 
we have (pk,Pj) = $kj> aud the {pk}f=i are the eigenvectors of the matrix XX T with 



X = 



X 1 - c X 2 - c ... X M - c 



It turns out that regardless of the final dimension n of the linear subspace, the first 
projection direction remains optimal, as does the second, third, etc.; these {pk}k=i 
do not change with n. Defining L e R" as the coefficients of the projection in this 
reduced space, LW = (X — c,pi), iP^ = (X — c,p2), etc., we write L = P n {X). The 
Galerkin reformulation then yields the reduced dynamics of the y of equation (II. ip in 
this n-dimensional space. If L = P„(y), then: 

dL d - 
M = dt Pn{v) 



7 - ' j t y 



because P n is linear. Continuing, 

= ^ (m) 



dL 
~dt 



\ \ fe=l / / 

so that now things are expressed in terms of L. Evaluating the right hand side is 
a challenge often requiring quadrature or involving cumbersome analytical formulas 
(unless / is linear). 

When the slow manifold of the system is not a linear subspace, this method will 
not give much information about the underlying dimensionality of the slow manifold 
because the number of modes used is often based on some sort of error control and not 
on manifold geometry. It will also take more modes (larger n) than the dimensionality 
of the slow manifold to accurately represent the dynamics. Figure 14.11 illustrates 
this by showing a one-dimensional (nonlinear) curve lying in three-dimensional space, 
respresentative of the possibility of one-dimensional dynamics in a higher-dimensional 
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z 
A 



2 




X 

Fig. 4.1. A one- dimensional curve, representing a one- dimensional slow manifold in three- 
dimensional space, is shown (left). The curve is not a straight line, and it requires three POD 
modes to be well represented. Ideally, we would like to write the slow dynamics in terms of one 
"mode;" for example, (x(s),y(s), z(s)) as a function of arclength s (right). Such a parametrization 
can be found using DMAPs. 

space. Figure 14.21 shows, in terms of a Fourier basis for the solution (again, denoted 
ai, ct2, . . .) of the reaction-diffusion equation (|4.1[) . two slices of the two-dimensional 
(but nonlinear) inertial manifold of the (high-dimensional) reaction-diffusion equation; 
again, the long-term dynamics are low-dimensional while the ambient space dimension 
is high. In these situations, a technique is desired in which we may write the reduced 
dynamics of equation (|1.1|) on the actual, low-dimensional slow manifold regardless of 
its shape or of whether it can be approximated by a hyperplane. 

4.2. The DMAP approach. In contrast to POD, whose model reduction is 
based upon the discovery of a linear subspace which approximately contains the slow 
manifold, diffusion maps allow us to write the dynamics of equation ([l.lj) on the 
"true" low-dimensional (and possibly nonlinear) slow, attractive, invariant manifold. 

In this section, we use DMAPs to find a parametrization of the slow manifold and 
denote the dimensionality reduction procedure as L = $> n (X), or, as before, we drop 
the n and write L = $ (X), where L £ R n . Here, ^ is a transformation from the high- 
dimensional ambient space to the low-dimensional parametrization. In our working 
example, the Chafee-Infante reaction-diffusion equation (|4.ip . the inertial manifold 
is two-dimensional; we therefore expect VE^ to give us our manifold parametrization. 
In Figure 14.21 we show a picture of the two-dimensional reaction-diffusion inertial 
manifold in only three-dimensional ambient space for visualization purposes. 

We now derive the reduced dynamics of equation (jl.ll) . If L = ^(y) (and y = 
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CO 




Fig. 4.2. For a particular class of reaction- diffusion equations and for certain parameter values, 
an inertial manifold of dimension 2 exists [25], parametrized in Fourier space by the first two Fourier 
modes (01,02). For visualization purposes, we have shown two "slices" of this inertial manifold, 
03(01,02) and 05(01,02), in three- dimensions (with points colored by 03 and 05, respectively). It is 
clear that both of these slices are highly nonplanar and that any sort of two-dimensional POD plane 
would do a poor job passing through or near all these points. 



dt dt^ 



V'HL)), then: 

dL d 
dt 

_ dV(y) dy 
dy dt 

by the chain rule, where d gffl is a n x d matrix and ^| is a c?-dimensional vector. 
Continuing, 



dL 0VE(y) - 

-dt = ^- f[y) 



<9* 



(+-'(3) 

dy 



/(^(l)) (4.2) 



so that now things are expressed in terms of L. Ways to obtain ^ 1 (jjj were 
discussed above (we used polynomial interpolation). It is worth noting again that 
Nystrom extension, ^(y), will also return values for points y € R d which are neither 
exactly on our manifold nor in the original data ensemble. 

When new points X n ew (points not in our original data ensemble) such as ^E' _1 (L) 
arise in the computation, we obtain 4jS at such points as follows: 



1 M 

* (j) (^new) = T- K ^ new ) W 
J i—i 



1 E-liW(i,new)^ 
^ 12iLi W{i, new) 
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Therefore, 

X * dX^ w { £f=i^new) ) 
! Eu=i W(l,new) (^-^-W(i,new)j - *f ) 

Aj (Efii^.new)) 2 

It is really only necessary to compute W (i, new) for those X which lie near Xnew 
since for distances beyond e, contributions from other points either dies off (Gaussian 
kernel) or disappears completely (Epanechnikov kernel). 

Now that we have these formulas, we can compute the dynamics of equation (|4.2[) 
starting from an initial condition in DMAP space. Even if this initial condition is 
selected to be one of the known data points, immediately after the first integration 
step, we will find the trajectory visiting "new" points Anew that do not belong to our 
original data set. When such new points arise, we are able to compute the necessary 
time derivatives via equation (|4.3[) . Armed with these formulas, we can now continue 
the integration of equation (14. 2\ , constantly going back and forth between DMAP 
and ambient representations in order to obtain the time derivatives in DMAP space. 

To validate our approach, we compare two finite trajectories: 

1. a trajectory begun at X(0) € R d , integrated according to equation (|l.ip for 

time t 

2. a trajectory begun at ty(X(0)) G R" (DMAP space), integrated according 
to equation (|4.2p for time t, and transformed back into the original, ambient space of 
R d using 

We integrated these two trajectories very accurately using odell3 and absolute and 
relative error tolerances of le-7. A particular slice of these two 10-dimensional tra- 
jectories is shown in Figure l4~3"1 clearly, they visually coincide (and do so for all other 
slices as well). 

It is also interesting to observe that reduction in the number of variables also 
reduces the model problem's stiffness. Numerically, we observe that for any d, the 
magnitude of the largest absolute eigenvalue of the Jacobian of the reformulated sys- 
tem (|4.2[) for our problem is O(l), whereas the magnitude of the maximum absolute 
eigenvalue of the Jacobian of the original system (jl.ip is 0(d). The reason is that 
derivatives of $(X) are zero in directions orthogonal to the slow manifold, so stiff 
directions (which in this example are approximately orthogonal to the manifold) are 
effectively projected out of equation (|4.2I) . In fact, for certain error tolerances, com- 
puting the trajectories shown in Figure 14.31 using our approach takes less wall clock 
time than computing the trajectories via the original dynamics With absolute 

and relative error tolerances of 10 -6 , for instance, ode45 spends 2.11 seconds of wall 
clock time to compute the "new" dynamics, while the original dynamics takes 5.73 
seconds and three times as many calls to the derivative function. Explicit integrators 
are less susceptible to stability constraints due to the reduced stiffness (allowing fewer 
time steps), while implicit integrators fare well for an entirely different reason: the 
reduced dimensionality of the problem makes the computational linear algebra (in 
the Newton iterations required to solve the corresponding nonlinear equations of the 
implicit integrator) faster. Our purpose in this paper is to implement and discuss the 
procedures necessary for the formulation and solution of the reduced equations (|4.2p ; 



dX 



(fc) 
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Trajectory comparison 




Fig. 4.3. A visual comparison of the trajectories obtained through the original dynamics of 
equation (black) and the new dynamics of equation (purple), shown for visualization 

purposes in three-dimensional space (points are colored by their value of a-i). To perform this 
comparison, the trajectory corresponding to the reformulated dynamics had to be transformed into 
the original space using the i S/~ 1 (L). 

a more quantitative comparison of the cost of integration of equations (jl.ip and (|4.2I) 
is the subject of current research. 

There are two reasonable ways of quantifying the difference between the two sets 
of dynamics, the original dynamics (|1.1|) and the "new" dynamics of our approach 
(|4.2j) . The first is to compare, in ambient space, the trajectories computed by our 
approach with those computed according to the original dynamics as in Figure 14.31 
The second is to compare these two trajectories in DMAP space instead. Comparing 
trajectories in either DMAP space or ambient space requires first obtaining these 
trajectories, and thus making choices about the particular integrators; however, even 
when the two sets of equations (the original and the new) are integrated with great 
numerical accuracy, independent of cost, their trajectories will differ because of the 
interpolation (the 'I' -1 map) in equation (I4.2|) . Furthermore, the original dynamics 
have a different dimensionality and stiffness than the dynamics of our approach. We 
therefore choose to focus on only the error involved in the evaluation of the right hand 
side of equation (|4.2j) . the error in the DMAP-based dynamics. 

Suppose the integrator (in DMAP space) produces a new point L (in DMAP 
space). The derivative at L constructued by equation (|4.2[) is only approximate be- 
cause the "J -1 map is not exact (and this equation uses the 'J -1 map twice). We use 
Ax to denote the interpolation erroJl: 

(L) = * t -ue (L) + AX. 



3 It is possible for Ax (the error in 4 1 ) to become large near the boundary of the manifold 
in DMAP space 31 unless the DMAP algorithm is slightly modified; although we do not observe 
this behavior (Figure 14.51 for instance, shows the 1 error to be negligible), we note it and avoid 
integrating trajectories in this region. 
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Now, we wish to study the error in equation (I4.2[) due to Ax. Restricing our attention 
to a single coordinate j e 1,2, n of DMAP space, equation (|4.2p takes the following 
form: 



d L(9) d$M (£)) 



dt 



f V- 1 L 



(4.4) 



where, now, g gi g> is a 1 x d matrix. Expanding equation (|4.4p . and keeping only first 
order terms, we see that the error in coordinate g is given as: 



crr (9) = 



dt 



/(*t"rue (£)) 



Af T \_ V fee (i 



r72 



(*true (i))j(*true (£) ) Af, 



(4.5) 



where 9 g^ g> and J are d x d matrices (and J is the Jacobian of the function f). 
Combining these terms, we see that the overall error in the gth component is bounded 

by 



,.(<;) 



< 1 1 Ail 



d 2 $0?) 



dy 2 



where || || represents the matrix (vector) norm. Here, / should be small since it 
is evaluated "on manifold" at point ^true (Lj ; essentially, since we are on the slow 

can be shown to be bounded and 



a 2 $(s) 



manifold, only slow derivatives are present. 

O(l) under appropriate scaling [31] ■ Finally, when the fast directions (eigenvectors of 
J with large negative eigenvalues) are approximately orthogonal to the manifold, as 
they are in our reaction-diffusion illustrative example, the norm of the n x d matrix 
^|JrJ will be small (on the order of the eigenvalues of the slow subspace); the kernel 

of 4jS consists of the directions orthogonal to the manifold. 

4.3. Nonlinear Galerkin techniques. An alternative approach to obtaining 
accurate reduced discretizations of dissipative PDEs is provided by the so-called "non- 
linear Galerkin" techniques in the context of approximate inertial manifolds. Instead 
of a Galerkin projection onto a large number of orthogonal eigenf unctions, leading 
to a large set of coupled ODEs, one (approximately) expresses the component of the 
solution in the "higher" modes as a function of its components in the "lower" modes 

(see, e.g., [g| Ea [13 EEI El 123 E3 EHl HSI). 

For the two-dimensional reaction-diffusion inertial manifold above, for instance, it 
is known that one can approximate the coefficients of each of the higher-order Fourier 
modes (03,04,05, ...) on the slow manifold as functions of just the first two Fourier 
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modes, Oi and a^- The nonlinear Galerkin formulation will then depend on only two 
variables, the first two Fourier modes, and the reduced dynamics are two-dimensional. 
Two popular techniques to approximate the slaving of the higher to the lower-order 
modes include the so-called "steady" and the "Euler-Galerkin" approximate intertial 
manifolds (AIMs). 

The steady approximation for our reaction-diffusion problem is implemented by 
setting 03 = 0, 0,4 = 0, as = 0, from which we obtain the functions 03(01,02), 
04(01, 02), 05(01, 02), ... We evolved the steady approximate inertial form using MAT- 
LAB's built-in Differential Algebraic Equation (DAE) solver odel5s. At 3.44 seconds, 
this code took longer than both the original and the DMAP-based dynamics; this 
may very well be because, even though the system is conceptually two-dimensional, 
the linear algebra computations in the DAE solver are performed in a higher (here, 
10) dimensional space. 

In the Euler-Galerkin approximation the functions {0^(01, 02)}*; are obtained 
by constraining ai and 02 to be constant, and taking an implicit Euler step (for 
appropriately chosen size r) of the constrained dynamics of the higher-order Fourier 
modes. Here, as in |14j . we used r = 1, which is large enough for the higher-order 
modes to get slaved to the first two modes. Instead of solving the nonlinear equations 
resulting from the implicit Euler step, a fixed point iteration is implemented (and in 
fact, only a single substitution is performed, since the map is so contracting). For 
simplicity, we consider only 01,02, and 03 and set 04 = 05 = ■ • • = 0. Then the 
constrained 03 dynamics are given by 

a 3 = a 3 diffusive a 3 reactive 

= (— 9^a 3 ) + (a 3 — 303 — 60^03 — 60^03 + — 30102). 

Taking one implicit Euler step of length t while holding ai and 02 constant, and 
using as initial condition a 3 (0) = (0 is a good initial guess since higher-order modes 
quickly become small), we obtain 

03(1") = 03(0) + to 3 (t) 

= + r [a 3 diffusive ( a i' a 2' a 3( r )) + ^reactive ( a i> a 2> a 3( T ))] • 
Moving the diffusive term to the left hand side, 

(1 + 9ri/)o 3 (r) = t[o 3 (t) - 3oJ(t) - 6o^o 3 (t) - 6a?a 3 (r) + a\ - 3aia|]. (4.6) 

The map 03 1+ t 9tu (03 — 3a| — 60^03 — 60^03 + af — 30102) can be shown to be a 
contraction, intuitively because of the large eigenvalues associated with the diffusion 
term. Hence, instead of solving equation (|4.6p . we invoke the method of successive 
substitutions with initial guess 03 = to obtain the approximation 03 = 1+ T 9ru (of — 
30103). 

A comparison of the exact manifold (from accurate full simulations) , the steady 
AIM, and the Euler-Galerkin AIM can be seen in Figure l4~4l In Figure l4~5l we show 
the magnitude of the error (the norm of the vector 

( a 3 computed ~ a 3 exact) ( a 4 computed _ °4 exact) •■• 

of errors) for the manifold constructed by our inverse Nystrom map (essentially a 
polynomial interpolation of points on the exact manifold), the steady AIM, and the 
Euler-Galerkin AIM. 
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Exact inertial manifold "Steady" manifold Euler-Galerkin manifold 




Fig. 4.4. A comparison of the exact inertial manifold with the two nonlinear Galerkin AIMs 
described in %4-S\ (see text). 



Inverse Nystrom error "Steady" error 
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Fig. 4.5. A comparison of the errors of our inverse Nystrom manifold (the "interpolated 
manifold" used to evolve equation \4-Sty), the steady AIM, and the Euler-Galerkin AIM (see text). 



We verified numerically that an isomorphism exists between the first two DMAP 
coordinates and the first two Fourier modes for points on the inertial manifold. This 
was established by verifying that the determinant of the Jacobian of the trans- 
formation from two-dimensional DMAP space to (01,02) Fourier space is every- 
where nonzero. This nonsingularity of the Jacobian is also suggested by Figure 
14.61 As expected, the DMAP correctly captures the two-dimensional geometry of 
the slow manifold; the DMAP eigenvalues we computed for the data using e = 0.2 
are (1.0, 0.8477, 0.7504, 0.4585, 0.4155, ...), and the first two nontrivial eigenvalues are 
clearly larger than the rest. 

5. Conclusion. In this paper we demonstrated a link between manifold learn- 
ing techniques (and, in particular, diffusion maps) and model reduction for dissipative 
evolutionary PDEs. The approach is data-based, and it provides an interesting ("non- 
linear" ) alternative to the the well-known Galerkin projection of the dynamics on the 
leading principal components of the data ensemble. The implementation of the algo- 
rithm is only slightly more involved than that of the classic POD-Galerkin method, 
and is often faster due to decreased stiffness and a more parsimonious dimension- 
ality reduction; the latter case occuring with low-dimensional but highly nonlinear 
(non-flat) slow manifolds residing in high-dimensional ambient spaces. 



18 



B. E. SONDAY, A. SINGER, C. W. GEAR, AND I. G. KEVREKIDIS 



1 st DMAP Coord. 2nd DMAP Coord. 




Fig. 4.6. A plot of the first DMAP coordinate (left) and the second DMAP coordinate (right) 
vs. the first two Fourier modes (a\ and 02). The figure strongly suggests that the determinant of 
the Jacobian is everywhere nonzero, or equivalently , that the transformation between the first two 
DMAP coordinates and the first two Fourier modes is unique; on the left, a\ appears to be one-to- 
one with the first DMAP coordinate, and on the right, once a value of ai (equivalently, the first 
DMAP coordinate) is specified, 02 one-to-one with the second DMAP coordinate. 

Instead of evaluating dL/dt from the right hand side of equation (|4.2[) . it is also 
possible to perform a short integration in physical space, transform the resulting 
trajectory into DMAP space, and use the transformed trajectory to estimate dL/dt 
"on demand." This is reminiscent of the "Galerkin-free" computations of [33]) where 
this approach was exploited to avoid the construction of the right-hand-side of a POD- 
Galerkin dynamical system. In that case, short bursts of physical simulation observed 
in POD space were used to estimate the time-derivatives of the POD components of 
the solution; these provided the input to projective integrators (see, e.g., [28l f2Tj ) . 

Having extrapolated the POD component values to future times, it is easy to 
construct physical initial conditions consistent with these values. In our case, however, 
having extrapolated the DMAP solution coordinates to future times, it is more difficult 
to find off-sample physical initial conditions consistent with these extrapolated values; 
our inverse Nystrom map ('I' -1 ) relied on polynomial interpolation. 

Our work started with an available ensemble of points on the manifold, without 
any sense of the trajectories from which these points were sampled; if we have the 
trajectories themselves, as well as time-derivatives (evaluated or estimated) at every 
sample point, then adaptive tabulation tools (see, e.g., [37]) can again help circumvent 
the cumbersome computation of the right-hand-side of equation (|4.2I) . 

We believe that manifold-learning techniques of the type we discussed here, and 
the reduced models they lead to, may be particularly useful for tasks for which small 
dimensionality is crucial (such as the approximation of stable manifolds or visualiza- 
tion of the dynamics). 

6. Acknowledgments. I. G. K. and A. S. are pleased to acknowledge motivat- 
ing discussions with Dr. R. Erban and Prof. M. S. Jolly. 
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